# Real-time Processing
Ultravox V0 5 Llama 3 2 1b GGUF
MIT
Ultravox v0.5 is an audio-to-text model optimized from the Llama-3 2.1B architecture, focusing on efficient speech transcription tasks.
Speech Recognition
U
ggml-org
421
1
Mediapipe Selfie Segmentation Landscape
Apache-2.0
A lightweight portrait segmentation model in ONNX format, specifically optimized for separating people from backgrounds in landscape images.
Image Segmentation
M
onnx-community
45
3
Vitpose Base Simple
A lightweight pose estimation model based on ViT architecture for human keypoint detection
Pose Estimation
Transformers

V
onnx-community
31
3
Coreml Sam2 Tiny
Apache-2.0
SAM 2 Tiny is the Core ML version of the general-purpose segmentation model for images and videos released by FAIR, optimized for mobile applications
Image Segmentation
C
apple
15
15
Genrevim Music Detection DistilHuBERT
This model is a fine-tuned audio classification model based on DistilHuBERT, specifically designed to distinguish between music and non-music audio.
Audio Classification
Transformers

G
MarekCech
61
0
Yolov8n Handwritten Text Detection
An object detection model based on YOLOv8, specifically designed for detecting handwritten text content
Object Detection Other
Y
armvectores
546
9
Trocr Base Plate Number
Apache-2.0
An example vision model for recognizing vehicle license plates, capable of extracting license plate numbers from images.
Text Recognition
Transformers

T
ghanahmada
100
1
Tiny Random Vits
Apache-2.0
Open-source model based on Apache-2.0 license, specific functionalities depend on the actual model
Large Language Model
Transformers

T
echarlaix
1,835
0
Ssast Audioset Librispeech 16 16
This model is used for audio classification tasks and can classify and recognize audio data.
Audio Classification
Transformers

S
yangwang825
18
1
Ast Finetuned Speech Commands V2
A voice command recognition model based on AST architecture, optimized for web deployment in ONNX format
Audio Classification
Transformers

A
Xenova
15
0
Pyannote Speaker Diarization Endpoint
MIT
Speaker diarization model based on pyannote.audio 2.0, used for automatically detecting and segmenting different speakers in audio
Speaker Analysis
P
KIFF
1,830
4
Whitebox Cartoonizer
CC
A TensorFlow SavedModel-based white-box cartoonizer model capable of converting real images into cartoon-style images.
Image Generation
W
sayakpaul
71
22
Whisper Small ISSAI KSC 335RS V2
A small speech recognition model based on the Whisper architecture, suitable for domain-specific speech-to-text tasks
Speech Recognition
Transformers

W
Shirali
83
1
Mscoco Finetuned CoCa ViT L 14 Laion2b S13b B90k
MIT
This is an image-to-text model based on the MIT license, capable of converting image content into textual descriptions.
Image-to-Text
M
laion
21.02k
20
Bert Seg V2
Apache-2.0
This is an open-source model based on the Apache-2.0 license, with specific functionalities to be determined by the actual model type
Large Language Model
Transformers

B
simonnedved
20
0
Unixcoder Base Unimodal
Apache-2.0
This is an open-source model using the Apache-2.0 license, with specific functionalities and application areas requiring further confirmation
Large Language Model
Transformers

U
microsoft
23
1
Distil Wav2vec2 Adult Child Cls 37m
Apache-2.0
An audio classification model based on the wav2vec 2.0 architecture, designed to distinguish between adult and child voices
Audio Classification
Transformers English

D
bookbot
15
2
Wav2vec2 Xls R Tf Left Right Trainer
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-xls-r-300m, supporting left-right channel processing
Speech Recognition
Transformers

W
hrdipto
30
0
Distilhubert Ft Keyword Spotting
Apache-2.0
Keyword recognition model based on the DistilHuBERT architecture, fine-tuned on the superb dataset with an accuracy of 97.06%
Audio Classification
Transformers

D
anton-l
14
1
Xlm Roberta Base Finetuned Somali
Apache-2.0
Large Language Model
Transformers

X
Davlan
81
0
Featured Recommended AI Models